library(nycflights13)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)HW 2
Problem 1
# months 12, 1, 7, 8
# carriers UA, AA, DL
# distance > 700
flights_sml <- flights %>%
filter(month %in% c(12, 1, 7, 8) & carrier %in% c("UA", "AA", "DL") & distance > 700)Problem 1.a
For each combination of the values of carrier and month, obtain the average arr_delay and obtain the average distance. Plot the average arr_delay against the average distance, use carrier as facet; add a title “Base plot” and center the title in the plot. This will be your base plot, say, as object p. Show the plot p
df1a <- flights_sml %>%
group_by(carrier, month) %>%
summarise(avg_arr_delay = mean(arr_delay, na.rm = TRUE), avg_distance = mean(distance, na.rm = TRUE))`summarise()` has grouped output by 'carrier'. You can override using the
`.groups` argument.
df1a# A tibble: 12 × 4
# Groups: carrier [3]
carrier month avg_arr_delay avg_distance
<chr> <int> <dbl> <dbl>
1 AA 1 1.19 1404.
2 AA 7 4.28 1376.
3 AA 8 -2.51 1378.
4 AA 12 7.52 1412.
5 DL 1 -4.04 1314.
6 DL 7 15.3 1357.
7 DL 8 1.31 1352.
8 DL 12 6.22 1324.
9 UA 1 3.72 1598.
10 UA 7 10.2 1708.
11 UA 8 3.63 1722.
12 UA 12 14.0 1655.
p <- ggplot(df1a, aes(x = avg_arr_delay, y = avg_distance)) +
geom_point() +
facet_wrap(~carrier) +
labs(title = "Base Plot", x = "Average Distance", y = "Average Arrival Delay") +
theme(plot.title = element_text(hjust = 0.5))
pProblem 1.b
Modify p as follows to get plot p1:
- connect the points for each
carriersvia one type of dashed line - code the 3 levels of
carrieras \(\alpha_1, \beta_{1, 2},\) and \(\gamma^{[0]}\) respectively, and display them in the strip texts - change the legend title into “My \(\zeta\)” (this legend is induced when you connect points for each
carrierby a type of line), and put the legend in horizontal direction at the bottom of the plot - add a title “With math expressions” and center the title in the plot
b_str <- c(expression(alpha[1]), expression(beta[1][2]), expression(gamma^0))
df1a$DF <- factor(df1a$carrier, labels = b_str)
p1 <- ggplot(df1a, aes(x = avg_arr_delay, y = avg_distance)) +
geom_point() +
geom_line(aes(group = carrier, linetype = carrier)) +
facet_wrap(~DF, labeller = label_parsed) +
labs(title = "With math expressions", x = "Average Distance", y = "Average Arrival Delay", linetype = expression(My~zeta)) +
theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom", legend.direction = "horizontal") +
scale_linetype_discrete(labels = b_str)
p1Problem 1.c
Modify p1 as follows to get plot p2:
- set the font size of the strip text to 12 and rotate the strip text counterclockwise by 15 degrees
- set the font size of the x axis text to be 10 and rotate the x axis text clockwise by 30 degrees
- set the x axis label as “\(\hat{\mu}\) for mean arrival delay”
- Add a title “with front and text adjustments” and center the title in the plot
p2 <- p1 +
theme(strip.text = element_text(size = 12, angle = -15),
axis.text.x = element_text(size = 10, angle = 30),
plot.title = element_text(hjust = 0.5)) +
labs(x = expression(hat(mu)~"for mean arrival delay"), title = "with front and text adjustments")
p2Problem 2
This problem requires you to visualize the binary relationship between members of a karate club as an undirected graph. Create a graph for karate. Once you obtain the graph, you will see each vertex is annotated by a number or letter. What do the numbers or letters refer to? Do you see the subgraphs of the graph? If so, what do they mean?
library(igraphdata)
library(igraph)
Attaching package: 'igraph'
The following object is masked from 'package:plotly':
groups
The following objects are masked from 'package:dplyr':
as_data_frame, groups, union
The following objects are masked from 'package:purrr':
compose, simplify
The following object is masked from 'package:tidyr':
crossing
The following object is masked from 'package:tibble':
as_data_frame
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
data(karate)
vertex.attributes(karate)$Faction
[1] 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2
$name
[1] "Mr Hi" "Actor 2" "Actor 3" "Actor 4" "Actor 5" "Actor 6"
[7] "Actor 7" "Actor 8" "Actor 9" "Actor 10" "Actor 11" "Actor 12"
[13] "Actor 13" "Actor 14" "Actor 15" "Actor 16" "Actor 17" "Actor 18"
[19] "Actor 19" "Actor 20" "Actor 21" "Actor 22" "Actor 23" "Actor 24"
[25] "Actor 25" "Actor 26" "Actor 27" "Actor 28" "Actor 29" "Actor 30"
[31] "Actor 31" "Actor 32" "Actor 33" "John A"
$label
[1] "H" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
[16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30"
[31] "31" "32" "33" "A"
$color
[1] 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2
legend_df <- data.frame(attr = unique(vertex_attr(karate, "Faction")), color = unique(V(karate)$color))
legend_df <- legend_df[order(legend_df$attr), c(1, 2)]
plot.igraph(karate)
legend(x = "topright", legend = legend_df$attr, col = c("orange", "skyblue"), title = "Faction", bty = "n", pch = 19)According to the graph and looking at the vertex attributes. The numbers in each vertex refer to an identifier for each person/“actor” in the club. As you can see from the call of vertex.attributes(karate), there are names like “Mr Hi”, “Actor 1”, “Actor 2”, etc. and they have labels of “H”, “2”, “3”, etc. There are two distinct subgraphs in the graph, and they represent the two “factions” that formed when the karate club split into two seperate clubs and who is a part of each one, the one led by John A (“A” in the graph) in blue, and the other led Mr. Hi (“H in the graph”) in orange.
Problem 3
This problem requires you to create an interactive plot using plotly
Create an interactive scatter plot between hwy and displ with the following:
colorbycyl- x-axis label as “engine displacement in liters” and y-axis label as “highway miles per gallon”
- legend title as “number of cylinders” and adjust the vertical position of the legend if you can
p3 <- plot_ly(mpg, x = ~displ, y = ~hwy, color = ~cyl, type = "scatter", mode="markers") %>%
layout(xaxis = list(title = "engine displacement in liters"), yaxis = list(title = "highway miles per gallon"))
p3 <- p3 %>%
layout(legend=list(title="number of cylinders"))
p3#this plot is another version of the above but uses ggplotly
p3a <- ggplot(mpg, aes(x = displ, y = hwy, color = cyl)) +
geom_point() +
labs(x = "engine displacement in liters", y = "highway miles per gallon", color = "number of cylinders") +
theme(legend.justification = c(0, 1), legend.position = c(0.025,0.975))
ggplotly(p3a)I have both versions in here because using plot_ly I can’t change the legend title, and ggplotly doesn’t allow me to change the legend position.